feat(ai): inference API #501

kallebysantos · 2025-02-25T20:41:24Z

What kind of change does this PR introduce?

feature

What is the current behavior?

Since PR #436, is possible to use onnx inference by calling the globalThis[Symbol.for('onnxruntime')]

What is the new behavior?

Coming from Issue #479, the Inference API is an user friendly interface that allows developers easily run their own models using the power of the low level onnx rust backend.

It's based on two core componenents RawSession and RawTensor

RawSession: A low level Supabase.ai.Session that can execute any .onnx model. It's recommended for use cases where need more control of the pre/pos-processing steps like text-to-audio example, as well when need to execute linear regression, tabular classification and self-made models.

For common tasks like nlp, audio or computer-vision. The huggingface/transformers.js is recommended, since it already does all the pre/pos-processing stuff.

RawTensor: A low level data representation of the model input/output. Inference API's Tensors are fully compatible with Transformers.js Tensors. It means that developers can still be using the high-lavel abstractions that transformers.js provides, like: .sum(), .normalize(), .min().

Examples:

Simple utilization:

Loading a RawSession:

// hosted on supabase storage
const session = await RawSession.fromStorage('models/model.onnx')
// or from hugging face repo
const session = await RawSession.fromHuggingFace('Supabase/gte-small');
// or using the model file url direclty
const session = await RawSession.fromUrl("https://example.com/model.onnx");

Executing a RawSession with RawTensor:

const session = await RawSession.fromUrl("https://example.com/model.onnx");

// Prepare the input tensors
const inputs = {
  input1: new RawTensor("float32", [1.0, 2.0, 3.0], [1, 3]),
  input2: new RawTensor("float32", [4.0, 5.0, 6.0], [1, 3]),
};

const outputs = await session.run(inputs);
console.log(outputs.output1); // Output tensor

Generating embeddings from scratch:

This example demonstrates how Inference API can be used to complex scenarios while taking advantage of Transformers.js high-level functions

import { Tensor } from "@huggingface/transformers.js";
const { RawTensor, RawSession } = Supabase.ai;
   
const session = await RawSession.fromHuggingFace('Supabase/gte-small');
   
// Example only, in real 'feature-extraction' tensors are given from the tokenizer step. 
// Consider 'n' as the batch size
const inputs = {
   input_ids: new RawTensor('float32', [1, 2, 3...], [n, 2]),
   attention_mask: new RawTensor('float32', [...], [n, 2]),
   // @ts-ignore: mixing Tensors from both
   token_types_ids: new Tensor('float32', [...], [n, 2])
};
   
const { last_hidden_state } = await session.run(inputs);
   
// Using `transformers.js` APIs
const hfTensor = Tensor.mean_pooling(last_hidden_state, inputs.attention_mask).normalize();
   
return hfTensor.tolist();

Self-made models

This example ilustrate how users can train their own model and execute it direclty from edge-runtime

Here you can check a Deployable example of it, with the current Supa stack

The model was trained to expect the following object payload

[
  {
    "Model_Year": 2021,
    "Engine_Size": 2.9,
    "Cylinders": 6,
    "Fuel_Consumption_in_City": 13.9,
    "Fuel_Consumption_in_City_Hwy": 10.3,
    "Fuel_Consumption_comb": 12.3,
    "Smog_Level": 3,
  },
  {
    "Model_Year": 2023,
    "Engine_Size": 2.4,
    "Cylinders": 4,
    "Fuel_Consumption_in_City": 9.9,
    "Fuel_Consumption_in_City_Hwy": 7.0,
    "Fuel_Consumption_comb": 8.6,
    "Smog_Level": 3,
  }
]

Then the model inference can done inside a common Edge Function

const { RawTensor, RawSession } = Supabase.ai;

// Custom filename on Hugging Face, default: 'model_quantized.onnx'
const session = await RawSession.fromStorage('models/vehicle-emission.onnx');

Deno.serve(async (req: Request) => {
  const carsBatchInput = await req.json();

  // Parsing objects to tensor input
  const inputTensors = {};
  session.inputs.forEach((inputKey) => {
    const values = carsBatchInput.map((item) => item[inputKey]);

    // This model uses `float32` tensors, but could variate to mixed types
    inputTensors[inputKey] = new RawTensor('float32', values, [values.length, 1]);
  });

  const { emissions } = await session.run(inputTensors);

  return Response.json({ result: emissions });  // [ 289.01, 199.53]
});

TODO:

Add Supabase Storage integration
Possibility for external request authentication
Tensor Audio support with tryEncodeAudio(), check out the text-to-audio example
~~Tensor Image support, for image generation~~
Cache revalidation
Fine control of the Session Id
Model size constraints, checking the size before downloading the model

github-actions · 2025-08-20T02:10:33Z

🔕 This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

kallebysantos · 2025-08-20T09:42:12Z

bot reply

- Exposing an user friendly interface to consume the `onnx` backend

- using `InferenceAPI` to perform `text-to-audio`. - encoding `wave` audio tensors from the rust land

Documenting the "magic numbers" of the `text-to-audio` exmaple, [original paper](https://arxiv.org/pdf/2306.07691)

- Adding `fromStorage` method to InferenceAPI, its allows model loadingfrom Supabase Storage with public/private bucket support.

kallebysantos force-pushed the feat-ort-inference-api branch from 86bff1c to d0e9461 Compare April 9, 2025 14:39

kallebysantos marked this pull request as ready for review April 9, 2025 14:40

kallebysantos changed the title ~~feat: inference API~~ feat(ai): inference API May 10, 2025

kallebysantos mentioned this pull request Jul 9, 2025

Using cache.put returns "Web Cache is not available in this context." #553

Open

2 tasks

github-actions bot added the stale label Aug 20, 2025

github-actions bot removed the stale label Aug 21, 2025

kallebysantos added 7 commits August 21, 2025 10:37

feat: creating inference_api

0d7f780

- Exposing an user friendly interface to consume the `onnx` backend

stamp: add typescript defs for 'InferenceAPI'

c906bf1

feat: adding text-to-audio example

9b2c48c

- using `InferenceAPI` to perform `text-to-audio`. - encoding `wave` audio tensors from the rust land

stamp: adding paper references for model magic numbers

b8951cb

Documenting the "magic numbers" of the `text-to-audio` exmaple, [original paper](https://arxiv.org/pdf/2306.07691)

fix(ext/ai): moving types to global.d.ts

b17079b

feat: support for authorization header on model fetch

eb1bb49

stamp: add model loading fromStorage

7107711

- Adding `fromStorage` method to InferenceAPI, its allows model loadingfrom Supabase Storage with public/private bucket support.

kallebysantos force-pushed the feat-ort-inference-api branch from d0e9461 to f21cd9d Compare August 21, 2025 09:38

stamp: clippy and format

bbcda7a

kallebysantos force-pushed the feat-ort-inference-api branch from f21cd9d to bbcda7a Compare August 21, 2025 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(ai): inference API #501

feat(ai): inference API #501

Uh oh!

kallebysantos commented Feb 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

kallebysantos commented Aug 20, 2025

Uh oh!

Uh oh!

Uh oh!

feat(ai): inference API #501

Are you sure you want to change the base?

feat(ai): inference API #501

Uh oh!

Conversation

kallebysantos commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

Simple utilization:

Generating embeddings from scratch:

Self-made models

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

kallebysantos commented Aug 20, 2025

Uh oh!

Uh oh!

kallebysantos commented Feb 25, 2025 •

edited

Loading